110 research outputs found
Vectors of Locally Aggregated Centers for Compact Video Representation
We propose a novel vector aggregation technique for compact video
representation, with application in accurate similarity detection within large
video datasets. The current state-of-the-art in visual search is formed by the
vector of locally aggregated descriptors (VLAD) of Jegou et. al. VLAD generates
compact video representations based on scale-invariant feature transform (SIFT)
vectors (extracted per frame) and local feature centers computed over a
training set. With the aim to increase robustness to visual distortions, we
propose a new approach that operates at a coarser level in the feature
representation. We create vectors of locally aggregated centers (VLAC) by first
clustering SIFT features to obtain local feature centers (LFCs) and then
encoding the latter with respect to given centers of local feature centers
(CLFCs), extracted from a training set. The sum-of-differences between the LFCs
and the CLFCs are aggregated to generate an extremely-compact video description
used for accurate video segment similarity detection. Experimentation using a
video dataset, comprising more than 1000 minutes of content from the Open Video
Project, shows that VLAC obtains substantial gains in terms of mean Average
Precision (mAP) against VLAD and the hyper-pooling method of Douze et. al.,
under the same compaction factor and the same set of distortions.Comment: Proc. IEEE International Conference on Multimedia and Expo, ICME
2015, Torino, Ital
Leaping Into Memories: Space-Time Deep Feature Synthesis
The success of deep learning models has led to their adaptation and adoption
by prominent video understanding methods. The majority of these approaches
encode features in a joint space-time modality for which the inner workings and
learned representations are difficult to visually interpret. We propose LEArned
Preconscious Synthesis (LEAPS), an architecture-agnostic method for
synthesizing videos from the internal spatiotemporal representations of models.
Using a stimulus video and a target class, we prime a fixed space-time model
and iteratively optimize a video initialized with random noise. We incorporate
additional regularizers to improve the feature diversity of the synthesized
videos as well as the cross-frame temporal coherence of motions. We
quantitatively and qualitatively evaluate the applicability of LEAPS by
inverting a range of spatiotemporal convolutional and attention-based
architectures trained on Kinetics-400, which to the best of our knowledge has
not been previously accomplished
Uni-NLX: Unifying Textual Explanations for Vision and Vision-Language Tasks
Natural Language Explanations (NLE) aim at supplementing the prediction of a
model with human-friendly natural text. Existing NLE approaches involve
training separate models for each downstream task. In this work, we propose
Uni-NLX, a unified framework that consolidates all NLE tasks into a single and
compact multi-task model using a unified training objective of text generation.
Additionally, we introduce two new NLE datasets: 1) ImageNetX, a dataset of
144K samples for explaining ImageNet categories, and 2) VQA-ParaX, a dataset of
123K samples for explaining the task of Visual Question Answering (VQA). Both
datasets are derived leveraging large language models (LLMs). By training on
the 1M combined NLE samples, our single unified framework is capable of
simultaneously performing seven NLE tasks including VQA, visual recognition and
visual reasoning tasks with 7X fewer parameters, demonstrating comparable
performance to the independent task-specific models in previous approaches, and
in certain tasks even outperforming them. Code is at
https://github.com/fawazsammani/uni-nlxComment: Accepted to ICCVW 202
Fast Desynchronization For Decentralized Multichannel Medium Access Control
Distributed desynchronization algorithms are key to wireless sensor networks
as they allow for medium access control in a decentralized manner. In this
paper, we view desynchronization primitives as iterative methods that solve
optimization problems. In particular, by formalizing a well established
desynchronization algorithm as a gradient descent method, we establish novel
upper bounds on the number of iterations required to reach convergence.
Moreover, by using Nesterov's accelerated gradient method, we propose a novel
desynchronization primitive that provides for faster convergence to the steady
state. Importantly, we propose a novel algorithm that leads to decentralized
time-synchronous multichannel TDMA coordination by formulating this task as an
optimization problem. Our simulations and experiments on a densely-connected
IEEE 802.15.4-based wireless sensor network demonstrate that our scheme
provides for faster convergence to the steady state, robustness to hidden
nodes, higher network throughput and comparable power dissipation with respect
to the recently standardized IEEE 802.15.4e-2012 time-synchronized channel
hopping (TSCH) scheme.Comment: to appear in IEEE Transactions on Communication
Convergence of Desynchronization Primitives in Wireless Sensor Networks: A Stochastic Modeling Approach
Desynchronization approaches in wireless sensor networks converge to
time-division multiple access (TDMA) of the shared medium without requiring
clock synchronization amongst the wireless sensors, or indeed the presence of a
central (coordinator) node. All such methods are based on the principle of
reactive listening of periodic "fire" or "pulse" broadcasts: each node updates
the time of its fire message broadcasts based on received fire messages from
some of the remaining nodes sharing the given spectrum. In this paper, we
present a novel framework to estimate the required iterations for convergence
to fair TDMA scheduling. Our estimates are fundamentally different from
previous conjectures or bounds found in the literature as, for the first time,
convergence to TDMA is defined in a stochastic sense. Our analytic results
apply to the Desync algorithm and to pulse-coupled oscillator algorithms with
inhibitory coupling. The experimental evaluation via iMote2 TinyOS nodes (based
on the IEEE 802.15.4 standard) as well as via computer simulations demonstrates
that, for the vast majority of settings, our stochastic model is within one
standard deviation from the experimentally-observed convergence iterations. The
proposed estimates are thus shown to characterize the desynchronization
convergence iterations significantly better than existing conjectures or
bounds. Therefore, they contribute towards the analytic understanding of how a
desynchronization-based system is expected to evolve from random initial
conditions to the desynchronized steady state.Comment: to appear, IEEE Transactions on Signal Processing, 201
Visualizing and Understanding Contrastive Learning
Contrastive learning has revolutionized the field of computer vision,
learning rich representations from unlabeled data, which generalize well to
diverse vision tasks. Consequently, it has become increasingly important to
explain these approaches and understand their inner workings mechanisms. Given
that contrastive models are trained with interdependent and interacting inputs
and aim to learn invariance through data augmentation, the existing methods for
explaining single-image systems (e.g., image classification models) are
inadequate as they fail to account for these factors. Additionally, there is a
lack of evaluation metrics designed to assess pairs of explanations, and no
analytical studies have been conducted to investigate the effectiveness of
different techniques used to explaining contrastive learning. In this work, we
design visual explanation methods that contribute towards understanding
similarity learning tasks from pairs of images. We further adapt existing
metrics, used to evaluate visual explanations of image classification systems,
to suit pairs of explanations and evaluate our proposed methods with these
metrics. Finally, we present a thorough analysis of visual explainability
methods for contrastive learning, establish their correlation with downstream
tasks and demonstrate the potential of our approaches to investigate their
merits and drawbacks
- …